202211.00376v1 


chinaXiv 


ChinaXivA F #5F! 
DATA PAPER 


A Prior Information Enhanced Extraction Framework for 
Document-level Financial Event Extraction 


Haitao Wang, Tong Zhu, Mingtao Wang, Guoliang Zhang & Wenliang Chen‘ 


School of Computer Science and Technology, Soochow University, Suzhou 215006, China 


Keywords: Event extraction; Information extraction; Financial event; Event detection; Event argument extraction 


Citation: Wang, H.T., et al.: A prior information enhanced extraction framework for document-level financial event extraction. 
Data Intelligence 3(3), 460-476 (2021). doi: 10.1162/dint_a_00103 
Received: February 11, 2021; Revised: April 8, 2021; Accepted: May 18, 2021 


ABSTRACT 


Document-level financial event extraction (DFEE) is the task of detecting events and extracting the 
corresponding event arguments in financial documents, which plays an important role in information 
extraction in the financial domain. This task is challenging as the financial documents are generally long text 
and event arguments of one event may be scattered in different sentences. To address this issue, we proposed 
a novel Prior Information Enhanced Extraction framework (PIEE) for DFEE, leveraging prior information from 
both event types and pre-trained language models. Specifically, PIEE consists of three components: event 
detection, event argument extraction, and event table filling. In event detection, we identify the event type. 
Then, the event type is explicitly used for event argument extraction. Meanwhile, the implicit information 
within language models also provides considerable cues for event arguments localization. Finally, all the 
event arguments are filled in an event table by a set of predefined heuristic rules. To demonstrate the 
effectiveness of our proposed framework, we participated in the share task of CCKS2020 Task 4-2: Document- 
level Event Arguments Extraction. On both Leaderboard A and Leaderboard B, PIEE took the first place and 
significantly outperformed the other systems. 


1. INTRODUCTION 


Event Extraction (EE) aims to identify different types of events and their corresponding arguments in text. 
In the financial domain, EE provides valuable structured information for investment analysis and asset 
management. To promote financial event extraction, the 14th China Conference on Knowledge Graph and 
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Semantic Computing (CCKS2020) set Task 4-2° for document-level financial event extraction (DFEE). The 
organizer collected documents from financial news and announcements, and required the participants to 
identify the event types and extract event arguments from the documents. 


In recent years, event extraction has attracted increasing attention due to its vast application and significant 
efforts have been devoted to it. However, most existing studies merely extract arguments within the sentence 
scope [1, 2, 3], dubbed as sentence-level EE (SEE). For document-level EE, these methods provide sub- 
optimal solutions because the event arguments are often scattered across different sentences in a document 
and global information should be exploited to enhance the model. As shown in Figure 1, most of the text 
data contain more than 500 Chinese characters. Under this circumstance, independently processing each 
sentence in the document destroys the integrity of events. Therefore, a document-level EE framework is vital 
to extract events from such long documents. 
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Figure 1. The text length distribution of data in CCKS2020 Task 4-2. 


In this paper, we proposed a Prior Information Enhanced Extraction framework (PIEE) for document-level 
financial event extraction, which can be decomposed into three steps: event detection, event argument 
extraction, and event table filling. Specifically, in event detection we first identified the event type of the 
document. Then, we utilized the event type as prior information for sentence-level event argument extraction. 
In this paper, we explored three paradigms for event argument extraction. With prior type information, all 
the three paradigms obtained consistent performance improvement. Moreover, inspired by the recent 
success of pre-trained language model (PLM) which is trained on large corpus and provides implicit prior 
information, we explored different language models for event argument extraction. Finally, event table 
filling integrated all event arguments extracted from different sentences by a set of heuristic rules. 


® https://www.biendata.xyz/competition/ccks_2020_4_2/ 
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In summary, our contributions are summarized as follows: 


e We proposed a novel prior information enhanced extraction framework (PIEE) for document-level 
financial event extraction, which is comprised of three steps: event detection, event argument 
extraction and event table filling. 

e We utilized event type as explicit prior information for sentence-level event argument extraction. 
Meanwhile, we explored the implicit prior information in different language models for event argument 
extraction. 

e In CCKS2020 Task 4-2, our system achieved 0.83007 Fl-score on Leaderboard A and 0.66996 
F1-score on Leaderboard B, both ranking the first place. 


2. RELATED WORK 


Event extraction has achieved great progress in recent years. However, most research [4, 5, 6] focused on 
sentence-level event extraction (SEE), and document-level event extraction (DEE) was less concerned. 
Yang et al. [7] and Zheng et al. [8] proposed two different frameworks for DEE. The former method (DCFEE) 
extracts event arguments in the form of SEE and combines the results of SEE into DEE by a key event 
detection and arguments-completion strategy, which depends on event triggers. The latter one establishes 
an end-to-end framework Doc2EDAG based on multiple transformer models and exploits an entity-based 
directed acyclic graph to implement the DEE without any elaborately designed rules. But at the same time, 
Doc2EDAG also faces problems such as complex structure, low efficiency, and large resource occupation. 


In the stage of event argument extraction, both of them regard it as a sequence labeling problem similar 
to NER, where BiLSTM-CRF [9] is a classic model to address this issue. Beyond that, with the successful 
application of machine reading comprehension (MRC) in many NLP problems [10, 11], MRC is also used 
in NER tasks with the advantage of significant prior information of the entity category. Recently, Yu et al. [12] 
applied the Biaffine model to NER tasks and achieved the state-of-the-art performance on eight corpora. 


In addition, compared to GloVe [13] and ELMo [14], recent language model BERT can capture more 
contextual and semantic information from texts. To mitigate the drawbacks of masking strategies in BERT, 
BERT-wwm [15] uses the Whole Word Masking (WWM) and ERNIE [16] designs the entity-level strategy 
and the phrase-level strategy to integrate external knowledge. RoBERTa [17] further proposes the dynamic 
masking strategy and removes the next sentence prediction task. Relative positional encoding is also 
employed in NEZHA [18] to enhance the encoding ability. 


Inspired by the above work, we proposed a prior information enhanced extraction framework for 
document-level financial event extraction. In contrast to DCFEE and Doc2EDAG, we first discovered events 
in texts, which helps identify the event arguments in the subsequent stages. To improve the performance 
of event argument extraction, advanced technologies in NER and recent language models were also 
introduced in our model. Furthermore, from the view of structure, our framework is simpler and faster. And 
the event triggers are not necessary in PIEE. 
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3. DATA 


This section presents data analysis and describes how to preprocess data. 


3.1 Data Analysis 


In order to have a comprehensive understanding of the data in the shared task, we listed statistical 
information. Figure 2 presents the co-occurrence distribution of different event types in the training data, 
including Bankruptcy Liquidation (BL), Equity Freeze (EF), Equity Underweight (EU), Equity Overweight 
(EO), Equity Pledge (EP), Asset Loss (AL), Accident (AC), Leader Death (LD), and External Indemnity (El). 
We can conclude that all the events in one document share the same event type. This observation greatly 
simplifies the process of event type identification. 


BL 1000 


F- 
EU- Fa = 
EO- EJ 600 


AL- Eg 400 
ae g -200 
BL EF EU EO EP AL AC LD EI 
Figure 2. Co-occurrence distribution of event types in training data. 


Figure 3 further shows the distribution histogram of the number of documents and instances in each 
event type. It can be observed that the event types are divided into two categories: one is that the event 
occurs only once in the document like Bankruptcy Liquidation, and the other is that the event can occur 
more than once in the same document such as Equity Pledge. This fact also contributes to subsequent event 
table filling. 


In summary, we can draw the following two conclusions: 


e Each document contains only one type of event. 
e There is only one event in the document which describes BL, AL, AC, LD and El, and documents 
introducing EU, EO, EF and EP usually contain more than one event. 
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Figure 3. Number of documents and instances in each event type. 


3.2 Data Preprocessing 


The data of this evaluation task mainly come from financial announcements and news on the Internet. 
Inevitably, there are noises in the crawled texts. Thus, it is necessary to clean the data for better system 
construction. 


As shown in Table 1, the original data contain the escape symbols and tags of HTML, which hinder the 
system’s semantic understanding of texts. We restore them except <br>, which is specially replaced with a 
single space considering that |n is a special flag when splitting the document. 


Table 1. Escape symbols and tags of HTML in the evaluation data. 


&nbsp &quot &apos &amp &gt &lt <br> 
\s í d & > < \n 


Moreover, in order to minimize the length of the text as possible, the continuous repeated punctuation, 
extra spaces and Web links are removed. We also converted traditional texts into simplified texts, and 
converted punctuation from SBC case to DBC case to construct more standardized data. Finally, all 
documents are divided into multiple sentences with a maximum length of 500 Chinese characters and 
event arguments in the sentence are tagged with BIO (Begin, Inside, Other) scheme in the training data. 


4. METHODOLOGY 


In this section, we introduce the details in our proposed framework. First of all, we needed to detect 
which event types are described in the documents. Then, we treated event argument extraction as a 
sequence labeling problem. At last, some heuristic strategies were applied to fill in the event tables. 
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4.1 Event Detection 


In the research of distantly supervised relation extraction, Riedel et al. [19] assumed: If two entities have 
a relation, at least one sentence can express that relation in all sentences containing those two entities. 
Inspired by this classical assumption, we also assumed: If a document contains an event type, at least one 
sentence from this document can fully describe that event type. 


In the previous research of event extraction, event trigger is often used to recognize the event type. 
However, no trigger words are explicitly provided in real scenarios. We assumed that in the document 
describing the event, there is at least one trigger word implicitly, and the sentence where the trigger word 
is located must be able to pick out the event type described in this document. Under this assumption, each 
document can be considered to be a sentence bag. 


Figure 4 shows the architecture of event detection. Sentences from the same document {5;, 53, ..., Sn} are 
first transformed into distributed representations by looking up the pre-trained char embeddings. Then, 
sentence encoder such as CNN and LSTM is applied to extract deep semantic features {h,, hy, ..., A,} for 
text classification. Similar to the research in relation extraction, sentences from the same document are 
regarded as one bag, and there are three strategies to represent a document d: ONE (at least one sentence), 
ATT (selective attention over sentences), and MAX (cross-sentence max pooling). 


Selector 


Figure 4. The architecture of event detection. 


4.1.1 ONE 


Zeng et al. [20] selected the most valuable sentence to represent the whole sentence bag d and the 
highest probability sentence is defined as follows: 


o; = Wh, + b 
j= age (1) 
i $ explo) 


d=h. 
j 


where W, e R, n, is the number of event types and h, is the size of hidden units. 
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4.1.2 ATT 


Following Lin et al. [21], to exploit the information of all available sentences, we can use the attention 
mechanism to aggregate sentence-level features. The score a; measures how well the input sentence s; and 
the target event type e matches can be obtained by the following equation: 


a; = hW,r, (2) 
where W, is a weighted diagonal matrix, and r, is the representation of event type e. 


Then, the representation of the document d is computed as a weighted sum of sentence-level features: 


expla) p 3 
"2S a ao á a 


4.1.3 MAX 


Jiang et al. [22] claimed that critical information can be also inferred implicitly from all sentences, so a 
max pooling operation is employed to capture the most valuable features in various aspects from all 
sentences. Formally, the document-level feature d is computed as follows: 


d =max(h,,h,,...,h,) (4) 


n 


Finally, event type is predicted by the representation of document d and cross-entropy is used as the 
objective function to optimize the models. 


4.2 Event Argument Extraction 


For event argument extraction, many classic methods of sequence labeling task can be used to extract 
event arguments in texts. In order to make full use of prior information of event type, we concatenated 
sentences and the representation of the corresponding event type before encoding. Thus, all sentences from 
the same document share the same event type predicted by event detection. Based on such input 
representation, we proposed three PLM-based architectures for sentence-level event argument extraction: 
PLM-CRF, PLM-MRC, and PLM-Biaffine. 


4.2.1 PLM-CRF 


BiLSTM-CRF is a classic model to address the NER task and has once achieved the state-of-the-art result 
in accuracy. Since pre-trained language models like BERT can capture deeper semantic and contextual 
information, in our PLM-CRF, the input sequence of PLM consists of event type and sentence. With the 
help of multiple layers of transformers in PLM, sentence can make full interaction with prior information. 
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Given the output of PLM {r,, fo, <e, Fm Xir Xx +++, X}, where r, is the output of event type and x; is the 
output of sentence, X = {x,, X2, ..., X} is then used as the input of the CRF layer. For a sequence of predictions 
y = {Yi Vx +, Yj, we define its score as in Equation (5): 


(XY)= DA ya + YW, 6) 


i=0 i=0 


where A e R'*?*"*?) is a matrix of transition scores and W e R"*” is used to calculate the scores of each 
label for each token, n, is the number of BIO tags and h is the hidden size of PLM. 


During training, we maximized the log-probability of the correct tag sequence. In the testing stage, we 
used Viterbi algorithm to decode the sequence. 


Gece ww ewww ewww eeeccceeoccccecesneees 


re 
' 

' 
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Figure 5. The architecture of event argument extraction. 


4.2.2 PLM-MRC 


At present, many NLP tasks can be converted into machine reading comprehension (MRC) problems, 
and inspired by Li et al. [23], we proposed a simplified version of MRC to address event argument extraction. 


First of all, we manually constructed some queries for event roles in different event types. For example, 
for Pledgor in Equity Pledge, the corresponding query is “who is the pledgor in equity pledge”. Similar to 
the operation in PLM-CRF, we also concatenated the query and sentence before PLM encoding. 


Then, given the representation of sentence X = {x,, X2, ..., X} output from the BERT, we can compute the 
probabilities of each token being a start index and an end index respectively as follows: 


P. = softmax(W.X + b,) 


(6) 
P, = softmax(W.X + b,) 


where W, e R"? and W, eR”, h is the hidden size of PLM. 
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In the prediction stage, all valid combinations for a start index and an end index are regarded as the 
span of event arguments, where there are no other start/end indices between them. 


4.2.3 PLM-Biaffine 


The Biaffine model is widely used in dependency parsing [24] and Yu et al. [12] first applied this 
architecture to address the NER task. Following their work, we also used the Biaffine model to extract event 
arguments in texts. 


Same as the operation in PLM-CRF, we first obtained the sentence representation X = {x,, Xx, ..., X} from 
PLM. After that, two feedforward neural networks (FFNN) were used to generate the representations for the 
start/end of the spans. Then a Biaffine model was applied to predict possible event roles for each span, 
including a special role named as NA, which means that the current span is not a valid event argument. 
Specifically, the score of event role for span <i, j> was computed as follows: 


hi =W.x, + b, 
h} = W,x, + b, (7) 
s(i,j) = h'Uh| + W, (hi © h!) +b 


where h! and h’ are the start/end representation of token i and j, sli, j) is the score distribution for span 

s e p J J p 
<i, j> among nr event roles. W, e R™, W,eR™, U e R", W, e R**” are trainable parameters in the 
Biaffine model. 


When decoding, the event role of each span is one of the highest scores and we ranked all non-NA 
spans by their category scores in a descending order. Entities in the sentence are regarded as event arguments 
only if their spans do not clash the boundaries of higher ranked entities, or there is no inclusive relation 
between higher ranked entities and them. 


4.3 Event Table Filling 


After obtaining the event types and event arguments in the document, we designed some heuristic 
strategies to convert the results of SEE to DEE. According to corollaries mentioned in Section 3.1, all event 
types can be divided into two categories: one type one event (OTOE) and one type multiple events (OTME). 


In the training data, events in OTOE always appear in the plain texts. The combination of valid event 
arguments with minimum internal distance® is selected as the event in document. Leader Death is a special 


event type in OTOE since it is obvious to find event triggers in the sentences, such as “Æt”, “arth”, and 


“Ret” (all mean pass away). The distance between triggers and event arguments is also considered while 
computing the internal distance. 


® We define the internal distance as the sum of distances between all event arguments. 
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In the OTME scenario, events mainly appear in the table. Thus, we first tended to use keywords, such 
as “KRHA EJ fiz)” (number of overweight equity), to locate the table, and parse table content 


with the help of regular expressions and event arguments extracted by models. If no event is found by table 
parsing, events are generated by the same method in OTOE. 


Additionally, there are some universal strategies. For example, we compared the longest common 
sequence (LCS) to determine whether a company name is a full name or an abbreviation. To reserve the 
special token (mostly <br>) in the final answer, we checked all answers which contain space and do not 
appear in the original text, and restored them to their original form. 


5. EVALUATION 


This section presents the experimental results on the evaluation data, and the detailed analysis. We 
compared different variants in event detection and event argument extraction mentioned in Section 4. 


5.1 Data Set and Experimental Setup 


Experiments are conducted on CCKS2020 Task 4-2 data set. This data set contains 9 event types. 
In the training data, there are 3,956 documents containing 5,521 events, which are annotated by distant 
supervision [25, 26]. Validation data and testing data are used for online evaluation on Leaderboard A and 
Leaderboard B, which contain 750 documents and 28,096 documents, respectively. In order to achieve 
better robustness and anti-noise capability, we used a 5-fold cross-validation to train each model. 


In the experiments of event detection, we used Adam to optimize parameters with a learning rate of 
0.001 and a minibatch size of 32. The hidden size of BiLSTM and CNN are both 256. While extracting 
event arguments, the learning rate is set to 2e-5 in PLM layers and 2e-4 in other layers. The maximum 
epoch of PLM-CRF, PLM-MRC and PLM-Biaffine is respectively 5, 3 and 5. In particular, the output size of 
FFNNs are both 256 in PLM-Biaffine. 


5.2 Experimental Results of Event Argument Extraction 


Table 2 shows the results of different models mentioned in Section 4.1. It is obvious that MAX-based 
models achieved the highest accuracy as MAX can capture the most valuable information from all sentences 
in the document. On the other hand, since predictive features could be diluted by noises in the document, 
ATT is not as good as MAX. Among three strategies, ONE shows the worst performance both in CNN-based 
models and BiLSTM-based models, which means that it is not enough to use the information of a single 
sentence to represent the full text in text classification. It is worth noting that the data of this evaluation 
task mainly come from financial announcements, which usually have a title that summarizes the full text. 
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Thus, a simplified solution is to exploit the information of the title to classify the document. Then we 
used the first sentence of each document for event detection. Compared to ONE, it works better, but not 
the best. 


Table 2. Different models for event detection. 


CNN BiLSTM BERT 
First-Sentence 0.98031 0.98158 0.98081 
ONE 0.97524 0.94045 - 
ATT 0.98233 0.97251 - 
MAX 0.98560 0.98988 - 


5.3 Experimental Results of Event Argument Extraction 


For three paradigms of event argument extraction, we all used BERT-wwm-Chinese as pretrained language 
model. In order to exploit the global information, the results of event detection were regarded as prior 
information, which was shared by all sentences from one document. As shown in Table 3, it is obvious that 
models using prior information of event types always perform better, which shows global information of a 
document is beneficial to event extraction and it is necessary to detect event type before event arguments 


extraction. 
Table 3. Different model variants for event argument extraction. 
Models F1-score Training Time/Epoch 

PLM-CRF t+ 0.82503 31min 
PLM-CRF 0.84033 31min 
PLM-MRC t+ 0.00000 63min 
PLM-MRC 0.84777 63min 
PLM-Biaffine t 0.82691 18min 
PLM-Biaffine 0.84772 18min 


Note: + means no prior event type information is utilized. 


Among all models, although PLM-MRC yields the best performance, PLM-Biaffine still achieves similar 
results, and has enormous advantage of training speed. Thus, we selected PLM-Biaffine as the basic model 
and further explored different PLMs in order to make full use of implicitly prior information within PLMs. 
From Table 4, we can observe NEZHA-large performs best, which directly leads to the result that we used 
only the combination of NEZHA-large and PLM-Biaffine (NEZHA-Biaffine) in the final competition. 
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Table 4. Different PLMs for PLM-Biaffine. 


PLM F1-score 
BERT-base 0.84615 
BERT-wwm 0.84772 
BERT-wwm-ext 0.84977 
ERNIE 0.84298 
RoBERTa-wwm-ext 0.85546 
RoBERTa-wwm-ext-large 0.86533 
NEZHA-large 0.86693 


5.4 Online Results 


According to the above experimental results, BiLSTM+MAX and NEZHA-Biaffine were selected as our 
final models. The detailed results are listed in Table 5, and it shows that our model (PIEE) is effective. 
Moreover, since the online result of Bankruptcy Liquidation, Asset Loss, Accident, Leader Death and External 
Indemnity are always O on the final testing data, we trained the new model on the data of rest event types 
again, which increased the results from 0.66247 to 0.66996. 


Table 5. Top 5 teams on Leaderborad A and Leaderborad B. 


Leaderborad A Leaderborad B 
Teams F1-score Teams F1-score 
PIEE 0.83007 PIEE 0.66996 
Rank 2 0.81411 Rank 2 0.65043 
Rank 3 0.80578 Rank 3 0.63469 
Rank 4 0.78422 Rank 4 0.61530 
Rank 5 0.78359 Rank 5 0.60464 


6. CONCLUSION AND FUTURE WORK 


In this paper, we proposed a Prior Information Enhanced Extraction Framework (PIEE) for document-level 
financial event extraction, which consists of three components: event detection, event argument extraction 
and event table filling. In our solution, we show that it is necessary to detect event types first in DEE, which 
is helpful to extract event arguments as explicit prior information. Moreover, we explore the implicit prior 
information of different PLMs in event argument extraction. For Document-level Event Argument Extraction 
in CCKS2020 Task 4-2, our system achieved 0.83007 Fl-score and 0.66996 F1-score on Leaderboard A 
and Leaderboard B, respectively, which are both the highest scores, showing the advantages of our 
framework. 
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Nevertheless, our framework could be further improved due to its potential limitations and deficiencies. 
On the whole, PIEE is a pipeline framework, which might cause error propagation and accumulation. For 
example, the performance of event argument extraction largely depends on the result of event detection. 
Moreover, it is inflexible to fill in the event tables using heuristic strategies. This is where we need further 
improvement in the future. 
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